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Abstract. Computing mountain passes is a standard way of finding criti- 
cal points. We describe a numerical method for finding critical points that 
is convergent in the nonsmooth case and locally superlinearly convergent in 
the smooth finite dimensional case. We apply these techniques to describe a 
strategy for the Wilkinson problem of calculating the distance of a matrix to 
a closest matrix with repeated eigenvalues. Finally, we relate critical points of 
mountain pass type to nonsmooth and metric critical point theory. 
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1. Introduction 

Computing mountain passes is an important problem in computational chemistry 
and in the study of nonlinear partial differential equations. We begin with the 
following definition. 

Definition 1.1. Let A be a topological space, and consider a,b E X. For a function 
/ : A — >■ R, define a mountain pass p* G r(a, b) to be a minimizer of the problem 

inf sup fopit). 
per{a,b) o<t<i 

Here, r(a, 6) is the set of continuous paths p : [0, 1] — > A such that p{0) = a and 
p{l) = b. 

Date: June 22, 2010. 
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An important problem in computational chemistry is to find the lowest energy 
to transition between two stable states. If a and b represent two states and / maps 
the states to their potential energies, then the mountain pass problem calculates 
this lowest energy. Early work on computing transition states includes Sinclair and 
Fletcher |38| , and recent work is reviewed by Henkelman, Johannesson and Jonsson 
|21| . We refer to this paper for further references in the Computational Chemistry 
literature. 

Perhaps more importantly, the mountain pass idea is also a useful tool in the anal- 
ysis of nonlinear partial differential equations. For a Banach space X, variational 
problems are problems (P) such that there exists a smooth functional J : X — > R 
whose critical points (points where VJ = 0) are solutions of (P). Many partial 
differential equations are variational problems, and critical points of J are "weak" 
solutions. In the landmark paper by Ambrosetti and Rabinowitz [4], the mountain 
pass theorem gives a sufficient condition for the existence of critical points in in- 
finite dimensional spaces. If an optimal path to solve the mountain pass problem 
exists and the maximum along the path is greater than max(/(a), /(6)), then the 
maximizer on the path is a critical point distinct from a and b. The mountain 
pass theorem and its variants are the primary ways to establish the existence of 
critical points and to find critical points numerically. For more on the mountain 
pass theorem and some of its generalizations, we refer the reader to |24| . 

In |13| . Choi and McKenna proposed a numerical algorithm for the mountain 
pass problem by using an idea from Aubin and Ekeland |5] to solve a semilinear 
partial differential equation. This is extended to find solutions of Morse index 2 
(that is, the maximum dimension of the subspace of X on which J" is negative 
definite) in Ding, Costa and Chen [Tn|, and then to higher Morse index by Li and 
Zhou l^. 

Li and Zhou [27], and Yao and Zhou [45] proved convergence results to show that 
their minimax method is sound for obtaining weak solutions to nonlinear partial 
differential equations. More and Munson [33] proposed an "elastic string method", 
and proved that the sequence of paths created by the elastic string method contains 
a limit point that is a critical point. 

The prevailing methods for numerically solving the mountain pass problem are 
motivated by finding a sequence of paths (by discretization or otherwise) such 
that the maximum along these paths decrease to the optimal value. Indeed, many 
methods in |21) approximate a mountain pass in this manner. As far as we are 
aware, only [SI [H] deviate from this strategy. We make use of a different approach 
by looking at the path connected components of the lower level sets of / instead. 

One easily sees that / is a lower bound of the mountain pass problem if and only 
if a and b lie in two different path connected components of lev<;/. A strategy to 
find an optimal mountain pass is to start with a lower bound / and keep increasing 
I until the path connected components of lev<;/ containing a and b respectively 
coalesce at some point. However, this strategy requires one to determine whether 
the points a and b lie in the same path connected component, which is not easy. 
We turn to finding saddle points of mountain pass type, as defined below. 

Definition 1.2. For a function / : X — R, a saddle point of mountain pass type 
X e A" is a point such that there exists an open set U such that x lies in the closure 
of two path components of (lev<j(j)/) n U. 
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We shall refer to saddle points of mountain pass type simply as saddle points. 
As an example, for the function / : — > K. defined by f{x) = xf ~ X2, the point 
is a saddle point of mountain pass type: We can choose U = M^, a = (0,1), 
b = (0,-1). When / is C^, it is clear that saddle points are critical points. As we 
shall see later (in Propositions 16.11 and 16. 2p . saddle points of mountain pass type 
can, under reasonable conditions, be characterized as maximal points on mountain 
passes, acting as "bottlenecks" between two components. In fact, if / is C^, the 
Hessians are nonsingular and several mild assumptions hold, these bottlenecks are 
exactly critical points of Morse index 1. We refer the reader to the lecture notes 
by Ambrosetti [3] . Some of the methods in [21] actually find saddle points instead 
of solving the mountain pass problem. 

We propose numerical methods to find saddle points using the strategy suggested 
in Definition 11.21 We start with a lower bound I and keep increasing I until the 
components of the level set lev<(/ n U containing a and b respectively coalesce, 
reaching the objective of the mountain pass problem. The first method we propose 
in Algorithm [271] is purely metric in nature. One appealing property of this method 
is that calculations are now localized near the critical point and we keep track of 
only two points instead of an entire path. Our algorithm enjoys a monotonicity 
property: The distance between two components decreases monotonically as the 
algorithm progresses, giving an indication of how close we are to the saddle point. 
In a practical implementation, local optiniality properties in terms of the gradients 
(or generalized gradients) can be helpful for finding saddle points. Such optimality 
conditions are covered in Section |9] 

It follows from the definitions that our algorithm, if it converges, converges to 
a saddle point. We then prove that any saddle point is deformationally critical in 
the sense of metric critical point theory |17[ [SS] [53] , and is Morse critical under 
additional conditions. This implies in particular that any saddle point is Clarke 
critical in the sense of nonsmooth critical point theory jl2| I37| based on nonsmooth 
analysis in the spirit of [U [M] [32l [36] . It seems that there are few existing numerical 
methods for finding either critical points in a metric space or nonsmooth critical 
points. Currently, we are only aware of [44| . 

One of the main contributions of this paper is to give a second method (in 
Section [3]) which converges locally super linearly to a nondegenerate smooth critical 
point, i.e., critical points where the Hessian is nonsingular, in R". A potentially 
difficult step in this second method is that we have to find the closest point between 
two components of the level sets. While the effort meeded to perform this step 
accurately may be great, the purpose of this step is to make sure that the problem is 
well aligned after this step. Moreover, this step need not be performed to optimality. 
In our numerical example in Section [8] we were able to obtain favorable results 
without performing this step. 

Our initial interest in the mountain pass problem came from computing the 2- 
norm distance of a matrix A to the closest matrix with repeated eigenvalues. This 
is also known as the Wilkinson problem, and this value is the smallest 2-norm 
perturbation that will make the eigenvalues of matrix A behave in a non-Lipschitz 
manner. Alam and Bora showed how the Wilkinson's problem can be reduced 
to a global mountain pass problem. We do not solve the global mountain pass 
problem associated with the Wilkinson problem, but we demonstrate that locally 
our algorithm converges quickly to a smooth critical point of mountain pass type. 
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Outline: Section[2]illustrates a local algorithm to find saddle points of mountain 
pass type, while Sections [3l |4] and [5] are devoted to the statement, proof of conver- 
gence, and additional observations of a fast local algorithm to find nondegenerate 
critical points of Morse index 1 in M". 

Sections [6] discusses the relationship between mountain passes, saddle points, and 
critical points in the sense of metric critcal point theory and nonsmooth analysis, 
and does not depend on material in Sections [3l |4] and [3 

Finally, Sections [7] and |S] illustrates the fast local algorithm in Section [3] Section 
[5] discusses optimality conditions for the subproblem in the algorithm in Section [51 

Notation: As we will encounter situations where we want to find the square of 
the jth coordinate of the ith iterate of x, we write xf{j) in the proof of Theorem 
14.81 In other parts, it will be clear from context whether the i in Xi is used as an 
iteration counter or as a reference to the ith coordinate. Let B''(0, r) be the ball 
with center and radius r in R'^, and B'^(0,r) be the corresponding open ball. 

2. A LEVEL SET ALGORITHM 

We present a level set algorithm to find saddle points. Assume / : AT — >■ M, where 
{X, d) is a metric space. 

Algorithm 2.1. (Level set algorithm) A local bisection method for approximating 
a mountain pass from xq to yo for f \u , where both xq and yo lie in some open path 
connected set U . 

(1) Start with an upper bound u and a lower bound I for the objective of the 
mountain pass problem and z = 0. 

(2) Solve the optimization problem 

min d(x, y) 

(2.1) s.t. xeSi,y&S2 

where Si is the component of the level set (lev<i(;_|_„-)/) n U that contains 
Xi and S2 is the component that contains yi. 

(3) If 5*1 and S2 are the same component, then ^{l + u) is an upper bound, 
otherwise it is a lower bound. Update the upper and lower bounds ac- 
cordingly. In the case where the lower bound is changed, increase z by 1, 
and let Xi and yi be the minimizers of ()2.ip . For future discussions, let k 
corresponding value of / to Xi and yi. Repeat step 2 until Xi and yi are 
sufficiently close. 

(4) If an actual approximate mountain pass is desired, take a path pi : [0, 1] — > 
U n (lev<ij/) connecting the points 

x'o, xi,. . . ,Xi--2,Xi-i,Xi,yi,y,_i,y,_2, • ■ • , yi,2/o- 

Step (3) is illustrated in Figure [5TT1 

To start the algorithm, an upper bound u can be taken to be the maximum of 
any path from Xq to yo, while a lower bound can be the maximum of f{xo) and 
f{yo). In fact, in step (3), we may update the upper bound u to be the maximum 
along the line segment joining Xi and yi if it is a better upper bound. 

In practice, one need not solve subproblem p.ip in step 2 too accurately, as it 
might be more profitable to move on to step 3. While theory demands the global 
optimizers for subproblem ()2.1|) . an implementation of Algorithm 1 2 . 1 1 can only find 
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Figure 2.1. Illustration of Algorithm 12. II 

local optimizers, which is not sufficient for the global mountain pass problem, but 
can be successful for the purpose of finding saddle points. The optimality conditions 
in terms of gradients (or generalized gradients) can be helpful for characterizing 
local optimality (see Section [9]). Notice that the saddle point property is local. If 
Xi and yi converge to a common limit, then it is clear from the definitions that the 
common limit is a saddle point. 

Another issue with subproblem (|2.ip in step 2 is that minimizers may not ex- 
ist. For example, the sets S\ and S2 may not be compact. We now discuss how 
convergence to a critical point in Algorithm 12.11 can fail in the finite dimensional 
case. 

The Palais-Smale condition is important in nonlinear analysis, and is often a 
necessary condition in the smooth and nonsmooth mountain pass theorems and 
other critical point existence theorems. We refer to |29l [Ml [551 [551 for more 
details. We recall its definition. 

Definition 2.2. Let X be a Banach space and / : AT E be a functional. 

We say that a sequence {a^i}^]^ C X is a Palais-Smale sequence if {/(a^Oli^i 
bounded and f'{xi) — >■ 0, and / satisfies the Palais-Smale condition if any Palais- 
Smale sequence admits a convergent subsequence. 

For nonsmooth /, the condition f'{xi) — >■ is mix* ^Qf(^xi) \x*\ — > instead. 

In the absence of the Palais-Smale condition. Algorithm 12. II may fail to converge 
because the sequence {{xi,yi)}'^^ need not have a limit point of the form (z, z), or 
the sequence {{xi,yi)}^^ need not even exist. The examples below document the 
possibilities. 

Example 2.3. (a) Consider / : R defined by f{x,y) — — y"^. Here, 

the distance between the two components of the level sets is zero for all lev<c/, 
where c < 0, and Xi and yi do not exist. The sequence {(i, 0)}°^^ is a Palais-Smale 
sequence but does not converge. 
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(b) For f{x,y) = e^^^ - y^e^^ , Xi and m exist, but both {xi}"^^ and 
do not have finite limits. Again, {{i,0)}°°^i is a Palais-Smale sequence that does 
not converge. 

It is possible that {xi}°°^^ and {j/i},^]^ have limit points but not a common limit 
point. To see this, consider the example / : M M defined by 

{X if a; < — 1 
-1 if - 1 < a; < 1 
—X if a; > 1. 

The set lev<_i/ is path-connected, but the set cl(lev<_i/) is not path-connected. 
Any point in the set (lev<_i/)\cl(lev<_i/) = (—1,1) is a local minimum, and 
hence a critical point. 

3. A LOCALLY SUPERLINEARLY CONVERGENT ALGORITHM 

In this section, we propose a locally superlinearly convergent algorithm for the 
mountain pass problem for smooth critical points in R". For this section, we take 
X = K" . Like Algorithm 12.11 earlier, we keep track of only two points in the space 
M" instead of a path. Our fast locally convergent algorithm does not require one 
to calculate the Hessian. Furthermore, we maintain upper and lower bounds that 
converge superlinearly to the critical value. The numerical performance of this 
method will be illustrated in Section [S] 

In Algorithm 13.11 below, we can assume that the endpoints xq and yo satisfy 
,f{xo) = f{yo)- Otherwise, if /(a;o) < fiuo) say, replace xq by the point x'q closest 
to Xq on the line segment [a;o,?/o] such that /(x'q) = f{yo)- 

Algorithm 3.1. (Fast local level set algorithm) Find saddle point between points 
Xq and yo for f : M" — > M. Assume that the objective of the mountain pass problem 
between xq and yo is greater than f{xo), and /(xq) = /(j/o)- Let U be a convex set 
containing xq and yQ. 

(1) Given points Xi and yi, find follows: 

(a) Replace Xi and yi by Xi and jji, where Xi and yi are minimizers of the 
problem 

min^^y \x - y\ 
s.t. X in same component as Xi in (lev<j(j..')/) H U 
y in same component as yt in (lev<j(j.^)/) H U 

(b) Find a minimizer of / on Li D U, say Zi. Here Li is the affine space 
orthogonal to Xi — yi passing through ^{xi + yi). 

(2) Find the point furthest away from Xi on the line segment [xi, z^], which we 
call Xi+i, such that f{x) < f{zi) for all x in the line segment [xi,^;^^!]. Do 
the same to find j/i+i. 

(3) Increase i, repeat steps 1 and 2 until \xi ^ yi\ is small, or if the value 
Mi — f{zi), where Mi := max^j^j^,. j^.j f{x), is small. 

(4) If an actual path is desired, take a path pi : [0, 1] — > X lying in lev<j\f;/ 
connecting the points 



xo,xi, . . . ,Xi-2,Xi-i,Xi,yi,yi-i,yi-2, ■ ■ ■ ,2/i,2/o- 
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As we will see in Propositions 14.31 and 15.4] a unique minimizing pair {xi, jji) in step 
1(a) exists imder added conditions. Furthermore, Proposition 14.51 implies that a 
unique minimizer oi f on Li H U exists under added conditions in step 1(b). 

To motivate step 1(b), consider any path from Xi to yi in U that lies wholly in 
U . Such a path has to pass through some point of Li D U, so the maximum value 
of / on the path is at least the minimum of / on Li D U. 

Step 1 (a) is analogous to step 2 of Algorithm 12.11 Algorithm 13.11 can be seen as 
an improvement Algorithm 12.11 The bisection algorithm in Algorithm 12.11 gives us 
a reliable way of finding the critical point, and step 1(b) in Algorithm 13. II reduces 
the distance between the components of the level sets as fast as possible. 

In practice, step 1(a) is difficult, and is performed only when the algorithm runs 
into difficulties. In fact, this step was not performed in our numerical experiments 
in Section [HI However, we can construct simple functions for which the affine space 
Li does not separate the two components containing Xi and yi in (lev<y(^.-|/) H U 
in step 1 (b) if step 1 (a) were not performed. 

In the minimum distance problem in step 1(a), notice that if / is and the 
gradients of / at a pair of points are nonzero and do not point in opposite di- 
rections, then in principle we can perturb the points along paths that decrease the 
distance between them while not increasing their function values. Of course, a good 
approximation of a minimizing pair may be hard to compute in practice: existing 
path-based algorithms for finding mountain passes face analogous computational 
challenges. One may employ the heuristic in Remark 15.71 for this problem. 

In step 2, continuity of / and p tells us that /(x^+i) = f{zi). We shall see 
in Theorem 14.81 that under added conditions, {f{xi)}i is an increasing sequence 
that converges to the critical value f{x). Furthermore, Propositions 14.51 and 15.31 
state that under added conditions, {Af^ji are upper bounds on f{x) that converge 
R-superlinearly to f{x), where R-superlinear convergence is defined as follows. 

Definition 3.2. A sequence in R converges R-superlinearly to zero if its absolute 
value is bounded by a superlinearly convergent sequence. 

4. SUPERLINEAR CONVERGENCE OF THE LOCAL ALGORITHM 

When / : M" — R is a quadratic whose Hessian has one negative eigenvalue and 
71—1 positive eigenvalues. Algorithm 13 . II converges to the critical point in one step. 
One might expect that if / is C^, then Algorithm 13.11 converges quickly. In this 
section, we will prove Theorem 14.81 on the superlinear convergence of Algorithm 

[33 

Recall that the Morse index of a critical point is the maximum dimension of a 
subspace on which the Hessian is negative definite, and a critical point is nonde- 
generate if its Hessian is invertible, and degenerate otherwise. In the smooth finite 
dimensional case, the Morse index equals the number of negative eigenvalues of the 
Hessian. If a function / : R" — >■ R is in a neighborhood of a nondegenerate 
critical point x of Morse index 1, we can readily make the following assumptions. 

Assumption 4.1. Assume that i = and /(O) = 0, and the Hessian H = H{0) 
is a diagonal matrix with entries ai, 02, . . . , a„_i, a„ in decreasing order, of which 
a„ is negative and a„_i is the smallest positive eigenvalue. 

Another assumption that we will use quite often in this section and the next is 
on the local approximation of / near 0. 
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Assumption 4.2. For 5 G (0, min{a„_i, — a„}), assume 6 > is small enough so 
that 



fix) ~ ^aja:2(j) 



<S\xf for all X G B(0, 



This particular choice of 9 gives a region 8(0,6*) where Figure I47l1 is vahd. We 
shaU use B to denote the open baU. 

Here is our first resuh on step 1 (a) of Algorithm 13.11 

Proposition 4.3. Suppose that f : M" — > K. is , and x is a nondegenerate critical 
point of Morse index 1 such that f{x) = c. If 9 > is sufficiently small, then for 
any e > ( depending on 9) sufficiently small, 

(1) (lev<c_e/) n B(a:, 0) has exactly two path connected components, and 

(2) There is a pair {x, y), where x and y lie in distinct components o/(lev<c-c/)n 
'R{x,9), such that \x — y\ is the distance between the two components in 
(lev<,_,/)nB(x,0). 

Proof. Suppose that Assumption 14.11 holds. Choose some 5 € (0, min{a„_i, — a„}) 
and a corresponding 9 > such that Assumption 14.21 holds. A simple bound on 
f{x) on 8(0,6*) is therefore: 

n n 

(4.1) ^(a, - S)x'{j) < fix) < ^(a, + S)x\j). 

So if e is small enough, the level set S := lev<_e/ satisfies 

s+ n B(o, 9) csn B(o, 9) cS-n b(o, 9), 

where 

5+ {x\Y,{a,+6)x'{j)< 
i=i 

n 

5_ := ix\jya,~6)x'{j)< 

and S+ n B(0, 9) is nonempty. Figure 14.11 shows a two-dimensional cross section 
of the sets S+ and 5- through the critical point and the closest points between 
components in 5*+ and 

Step 1: Calculate variables in Figure [4.11 

The two points in distinct components of 5+ closest to each other are the points 



0,±^ _g j , and one easily calculates the values of b and c (which are the 
distances between and and that of and 5+ respectively) in the diagram 
to be ^ / and ./ _g ■ Thus the distance between the two components of 



S is at most 2y^ -a"^ -s • "^^^^ points in S that minimize the distance between the 
components must lie in two cylinders Ci and C2 defined by 

Ci := B"-i(0,a) X [fe-2c, -5] CM""^ xM, 
(4.2) C2 ■■= B""i(0, a) X [b, 2c - 6] C M""^ x M, 
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— '^^1 (a„,_i-i)(-a„-i) 





Figure 4.1. Local structure of saddle point. 



for some a > 0. In other words, Ci and C2 are cylinders with spherical base of 
radius a such that 

{S-\S+)f^{W-^ X [6-2c,2c-&])nB(0,6i)cCiUC2. 

They are represented as the left and right rectangles in Figure 
We now find a value of a. We can let x{n) = 2c — fo, and we need 

n-l 

^{aj~6)x'^{j) + {an-'5)x^{n) < -e 

=>J2{a,-S)x^{j) + {a,,^S)(^2j ' . -J ' < -e. 

Continuing the arithmetic gives 



J2i<^, - S)x\j) 



8eS 



1 



—a.n — 5 —an + 5 \/—an — 5%/— a„ + 5 
4 1 4 



— a„ — 5 —Cln + 5 —CLn 



The radius is maximized when a;(l) — x(2) = • • • = x{n — 2) — {) and x{n — 1) = 

9 / 2ei5 

(a„_i-5)(-a„-5)' 



which gives our value of a. 
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Step 2: (lev< e/) nB(0, 0) has exactly two components if e is small 
enough. 

Note that (lev<_(;/)nB(0, 6) does not intersect the subspace L' {x \ x{n) = 0}, 
since f{x) > for all x £ L' (1 B(0, 9). We proceed to show that 

[/< := {x I x{n) < 0} 01(0,6*) 

contains exactly one path connected component if e is small enough. A similar 
statement for [/> defined in a similar way will allow us to conclude that (lev<_(:/)n 
]B(0, 6) has exactly two components. 

Consider two points vi,V2 in (lev<_e/)nJ7<. We want to find a path connecting 
vi and V2 and contained in (lev<_e/)n?7< . We may assume that wi(n) < V2{n) < 0. 
By the continuity of the Hessian, assume that 6 is small enough so that for all 
X e M{0,9), the top left principal submatrix of H{x) corresponding to the first 
n — 1 elements is positive definite. Consider the subspace L'(a) := {x \ x{n) = a}. 
The positive definiteness of the submatrix of H{x) on B(0, 9) tells us that / is 
strictly convex on B(0, 9) n L'{a). 

If Vi{n) = V2(n), then the line segment connecting vi and V2 lies in (lev<_e/) H 
nB(0, 9) by the convexity of / on L'{vi{n))nM{0, 9). Otherwise, assume 
that vi{n) < V2{n). 

Here is a lemma that we will need for the proof. 

Lemma 4.4. Suppose Assumption \4-l\ holds. We can reduce 9 > and 5 > if 

necessary so that Assumption \4-2\ is satisfied, and the nth component of\If{x) is 
positive for all x G (lev<o/) H B(0, 9) \^ {x \ x{n) < 0}. 

Proof. We first define S- by 

n-1 

:= {x I (a„_i -S)Y, ^^ij) + (an - 5)x\n) < 0}. 

It is clear that (a„_i — 6) J2^jZi ^^(j) + (^n — S)x'^{n) < f{x) for all x e B(0, 9), so 
(lev<o/) n B(0, 9)(ZS-C] B(0, 9). 

We now use the expansion V f{x) = H{0)x + o{\x\), and prove that the nth 
component of V/(x) is negative for all x E S- D B(0, 9) D {x \ x{n) < 0}. We can 
reduce 9 so that |V/(a;) - H{0)x\ < 6 \x\ for all x £ B(0, 9). Note that if a; e 
then 

n-1 

{Un-l - S)^x'^{j) + {an ~ S)x^{jl) < 
J = l 

^ (ftn-i - f^) Nl^ + (a„ - a„_i)a;^(n) < 

< /"-^ ' (-.x(n)) . 

The nth component of V/(a;) is bounded from below by 



a„x(n) — 5 |x| < anx{n) + 6a — ^x{n). 

V "n-l - 

Provided that S is small enough, the term above is positive since a;(77,) < 0. □ 
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We now return to show that there is a path connecting vi and V2- Note that 
S+ n 8(0, 9) n {x I x{n) < 0} is a convex set. (To see this, note that S'_|. n {x | 
x{n) < 0} can be rotated so that it is the epigraph of a convex function.) Since 
5+ n M{0,9) C (lev<_e/) n M{0,6), the open hne segment connecting the points 
(0, -6), (0, -c) e W'-^ X R hes in (lev<_ J) n 1(0, 6). If -6 < vi{n) < V2{n) < 
— c, the piecewise hnear path connecting V2 to {0,V2{n)) to (0,vi{n)) to vi lies in 
(lev<_J)n 1(0,0). 

In the case when V2{n) > —c, we see that V2 must he in Ci. Lemma teUs us 
that the line segment joining V2 and V2 + (0, — c— V2{n)) hes in (lev<_e/) nl(0, 6). 
This ahows us to find a path connecting V2 to vi . 

Step 3: X and y lie in 1(0,6*). 

The points x and y must lie in C'l and C'2 respectively, and both Ci and C2 lie 
in 1(0,0) if e is small enough. Therefore, we can minimize over the compact sets 
(lev<_(:/) n Ci and (lev<_g/) H C2, which tells us that a minimizing pair (i,y) 
exist. □ 



In fact, under the assumptions of Proposition l4.31 x and y are unique, but all we 
need in the proof of Proposition 14.51 below is that x and y lie in the sets Ci and C2 
defined by (|4.2|) respectively ans represented as rectangles in Figure [47T] We defer 
the proof of uniqueness to Proposition 15.41 

Our next result is on a bound for possible locations of Zi in step 1(b). 

Proposition 4.5. Suppose that f : R" — > R is , and x is a nondegenerate critical 
point of Morse index 1 such that f{x) = c. If 9 is small enough, then for all small 
e > (depending on 9), 

(1) Two closest points of the two components o/(lev<c_e/)nl(.T,0), say x and 
y, exist, 

(2) For any such points x and y, f is strictly convex on L D M{x,9), where L 
is the orthogonal bisector of x and y, and 

(3) / has a unique minimizer on LC]M{x,9). Furthermore, niin^pji^Q / < 
f{x) < max[j_y] /. 

Proof. Suppose that Assumption 14.11 holds, and choose 6 E (0, min{a„_i, — a„}). 
Suppose that > is small enough such that Assumption 14.21 holds. Throughout 
this proof, we assume all vectors accented with a hat 'A' are of Euclidean length 1. 
It is clear that f{x) = f{y) ~ — e. Point (1) of the result comes from Proposition 
4.31 We first prove the following lemma. 

Lemma 4.6. Suppose Assumptions \^n\ and \J^ hold. If 9 > is small enough, 
then for all small e > ( depending on 9 ), two closest points of the two components 
of (lev<_e/) n 1(0,0), say x and y, exist. Let L be the perpendicular bisector of x 
and y. Then 

(lev<o/) n i n 1(0, 9) C l"-i I 0, a J-t^!i±^ I x (-a, a) , 




whe 

Proof. By Proposition 14.31 the points x and y must exist. We proceed to prove the 
rest of Lemma [ 
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Step 1: Calculate remaining values in Figure [4.11 

We calculated the values of a, b and c in step 2 of the proof of Proposition 14.31 
and we proceed to calculate the rest of the variables in Figure 14.11 The middle 
rectangle in Figure 14.11 represents the possible locations of midpoints of points in 
Ci and C2, and is a cylinder as well. We call this set M. The radius of this cylinder 
is the same as that of Ci and C2, and the width of this cylinder is 4(c — b), which 
gives an o{d) approximation 



4(c-6) = 4 



-a„ -6 V -On + S 



{-an - S){-an + 6) 



5 

1 



-2a, 
o{6) 




These calculations suffice for the calculations in step 2 of this proof. 

Step 2: Set up optimization problem for bound on (lev<o/) H LnB(0, 6*). 

From the values of a and b calculated previously, we deduce that a vector C2 ~ ci , 
with Ci € Ci, can be scaled so that it is of the form (7^ Vi, 1), where Vi S K"~^ is of 
norm 1 and < 7 < 1. (i.e., the norm corresponding to the first n — 1 coordinates 
is at most |.) These are possible normals for L, the perpendicular bisector of x 
and y. The formula for | is 



a . 2e5 



b y (a„_i - 6){-an - 5) 



26{-an + S) 



V (a„-i - (5)(-a„ - S) 
So we can represent a normal of the afhne space L as 




(4.3) I 271 J , 2^^^ ,, vi,l) for some < 71 < 1. 

(a„_i - d)(-a„ -6) J 

We now proceed to bound the minimum of / on all possible perpendicular bisectors 
of ci and C2 within 8(0,6'), where ci G Ci and C2 G C2. We find the largest value 
of a such that 

• there is a point of the form (v2, a) lying in 5-, where 

ri-l 

:= {x I (a„_i -5)Y, ^^U) + («« - S)x^{n) < 0} C M""^ x M. 

• (v2,q;) G L for some affine space L passing through a point p M and 
having a normal vector of the form in Formula (|4.3p . 

The set S- is the same as that defined in the proof of Lemma 14.41 Note that 
S- n B(0, 6) D (lev<o/) n B(0, 9), and this largest value of a is an upper bound on 
the absolute value of the nth coordinate of elements in (lev<o/) D L O B(0, 6). 
Step 3: Solving for a. 
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For a point (v2, a) G S-, where V2 = {x{l) , x {2) , . . . , x{n — 1)) £ M" ^, we have 

n-l 



|V2|' 



(-a„ + (5) 2 
- 7 IT" ■ 



|V2| < W7 ^a. 



Therefore, we can write (v2 , a) as 



where V2 G M"^^ is a vector of unit norm, and < 72 < 1. We can assume that p 
has coordinates 



where V3 e M" ^ is some vector of unit norm, and < 73,74 < 1. Note that the 
nth component is half the width of M. Hence a possible tangent on L is 



7 J^av2,a - 273 W- — — V3,274W h o((5 



To simplify notation, note that we only require an 0{S) approximation of a, we 
can take the terms like — a„ + 6 and — a„ — (5 to be — a„ + 0{S) and so on. The 
dot product of the above vector and the normal of the affine space L calculated in 
Formula (j4.3p must be zero, which after some simplification gives: 



72^/^ + 0(5) ) av2 - 273 J , + 0{5^/^) V3 



274,/ — — + 0(5)1 ■ 2^iJ^ + 0(5^/') vi^l = 0- 



At this point, we remind the reader that the 0{&^) terms mean that there exists 
some K > Q such that if 5 were small enough, we can find terms ti to ty, such that 
\ti\ < K5^ and the formula above is satisfied by ti in place of the 0{5^) terms. 
Further arithmetic gives 



47173W r(v3 • vi) + — + o{5) 



+ 27172 J^,/^(V2 • vi) + 0(5^/2) 



^{1 + 0{VS)) 
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To find an upper bound for a, it is clear that we should take 71 = 73 = 74 = 1 and 
V3 • vi = 1. The 0{^/5) term is superfluous, and this simplifies to give 



(4.5) a<S. + ]+oiS). 

V -a„ \a„_i -an J 

We could flnd the minimum possible value of a by these same series of steps and 
show that the absolute value would be bounded above by the same bound. This 
ends the proof of Lemma 14.61 □ 

It is clear that the minimum value of / on L n M{0,d) is at most 0, since L 
intersects the axis corresponding to the nth coordinate and / is nonpositive there. 
Therefore the set (lev<o/) n L n 18(0, 9) is nonempty, and / has a local minimizer 
on LnB(0,6'). 

We now state and prove our second lemma that will conclude the proof of Propo- 
sition |4?5l 



Lemma 4.7. Let L be the perpendicular bisector of x and y as defined in point (1) 
of Proposition with x — Q. If 5 and 6 are small enough satisfying Assumptions 
\4^.1\ and \4-.2[ then f |LnB(o,e) strictly convex. 

Proof. The lineality space of L, written as lin(L), is the space of vectors orthogonal 
to X — y. We can infer from Formula (|4.3p that x — y is a scalar multiple of a vector 
of the form (w, 1), where w £ M""^ satisfles |w| — >■ as (5 — 0. We consider a 
vector V £ lin(L) orthogonal to {w, 1) that can be scaled so that v = {w, 1), where 
(w, 1) • {w, 1) = 0, which gives w ■ lu = —1. The Cauchy Schwarz inequality gives us 

\w\ \w\ > \w ■ w\ 

= 1 



> w 



So 



v'^H{p)v _ v'^H{0)v v^{H{p) - H{0))v 

v'^v v'^v v'^v 

E;^i + ^ v^{H{p)-H{0))v 

e;ji^'(j)+i 

a«-i EjCi vHj) + an v'^{H{p) - H{0))i 



> 



(1) 

Since Ej=i ^^(j) 1^1^ — > 00 as — > 0, the limit of term (1) is a„_i, so there is 
an open set B(0, 9) containing such that " i/r*'^'*" > ^a^-i for all v £ lin(i) n {x \ 
x{n) = 1} and p £ M{0,9). By the continuity of the Hessian, we may reduce 9 if 

necessary so that \\H{p) - H{0)\\ < ia„_i for all p £ M{0,9). Thus ^^^^^^^ > 
for all p £ M{0, 9) and v £ lin(L) n {x \ x{n) = 1} if 5 is small enough. 

The vectors of the form v = {w^Q) do not present additional difficulties as the cor- 
responding term (1) is at least a„_i. This proves that the Hessian H{p) restricted 
to lin(I/) is positive definite, and hence the strict convexity of / on LnB(0, 9). □ 
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Since / has a local minimizer in L nB(0, 9) and is strictly convex there, we have 
(2) and the first part of part (3). The inequality f{x) < niax[j / follows easily 
from the fact that the line segment [x, y] intersects the set {x \ x{n) = 0}, on which 
/ is nonnegative. □ 

Here is our theorem on the convergence of Algorithm 13.11 

Theorem 4.8. Suppose that f : M" — > R is in a neighborhood of a nondegenerate 
critical point x of Morse index 1. If 9 > is sufficiently small and xq and j/o 
chosen such that 

(a) Xq and yQ lie in the two different components of (lev ^f^^^^f) r\M{x, 9), 

(b) fixo) = fiyo) < fix), 

then Algorithm \3.1\ with U ~ M{x,9) generates a sequence of iterates {xi}^ and 
{yi}i ly^i^g 8(5;, 9) such that the function values and {f{yi)}^ converge 

to f{x) superlinearly, and the iterates {ij}, and {iji}^ converge to x superlinearly. 

Proof. As usual, suppose Assumption 14.11 holds, and S and 9 are chosen so that 
Assumption 14.21 holds. 

Step 1: Linear convergence of f{xi) to critical value f{x). 

Let e = f{xi). The next iterate Xi^i satisfies /(I'^+i) = fizi), and is bounded 
from below by 

f{x,+i) > {an - S)a' = -6(52 + + dS^), 

where a is the value calculated in Lemma 14.61 The ratio between the previous 
function value and the next function value is at most 

\a„_i —a,,/ 

This ratio goes to as (5 \ 0, so we can choose some (5 small enough so that 
p <\- We can choose 9 corresponding to the value of b satisfying Assumption 14.21 
This shows that the convergence to of the function values /(xi+i) = /(I'i+i) in 
Algorithm 13. II is linear provided and yo lie in B(O,0) and e is small enough by 
Proposition 14.31 We can reduce 9 if necessary so that /(a;) > — e for all a; e B(0, 6*), 
so the condition on e does not present difficulties. 

Step 2: Superlinear convergence of f(xi) to critical value fix). 

Choose a sequence {(5fc}fe so that \ monotonically. Corresponding to (5fe, 
we can choose 9k satisfying Assumption 14.21 Since \xi\i and converge to 

0. for any fc g Z-|_, we can find some i* G Z+ so that the cylinders C\ and C2 
constructed in Figure HTT] corresponding to ~ —f[xi) and b = 8\ lie wholly in 
B(0, 6'fc) for all i > i* . As remarked in step 3 of the proof of Proposition |4?3l Xi and 
jji must lie inside Ci and C2, so we can take S ~ 6k for the ratio p. This means 
that ^'|/(;^^)|^^ ^ p{^k) for all i > i* . As p{S) \ as (5 \ 0, this means that we have 
superlinear convergence of the f{xi) to the critical value f{x). 

Step 3: Superlinear convergence of Xi to the critical point x. 

We now proceed to prove that the distance between the critical point and 
the iterates decrease superlinearly by calculating the value -^^^l^, or alternatively 

■l^ljylJ-. The value \xi\ satisfies \xi\^ > ~ -a \8 ■ To find an upper bound for 

[ii+il^, it is instructive to look at an upper bound for \xi\ first. As can be deduced 
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from Figure |4?T1 an upper bound for is the square of the distance between 
and the furthest point in C'l , which is 

{2c-bf + a^ = (c+{c-b)f + a^ 

e , eS 8eS 



-Un-S (-a„)2 (a„_i - (5)(-a„ - (5) 



This means that an upper bound for jxi+ij^ is 



8eS f 1 1 



0(5). 



From this point, one easily sees that as i — >■ oo, (5 — > 0, and ' .I'^U — > 0. This 
gives the superUnear convergence of the distance between the critical point and the 
iterates Xi that we seek. □ 

5. Further properties of the local algorithm 

In this section, we take note of some interesting properties of Algorithm 13.11 
First, we show that it is easy to find Xi+i and yi+i in step 2 of Algorithm 13. II 



Proposition 5.1. Suppose the conditions in Theorem \4-S\ hold. Consider the se- 
quence of iterates {xi}i and {yi}i generated by Alaorithm \3.1\ If i is large enough, 
then either Xi+i = Zi or yi+i = Zi in step 2 of A laorithm 1 3. 1[ 

Proof. Let p : [0, 1] — > R" denote the piecewise linear path connecting Xi to Zi to 
yi. It suffices to prove that along p, the function / increases to a maximum, and 
then decreases. Suppose Assumptions 14.11 and 14.21 hold. The cylinders Ci and C2 
in Figure [01 are loci for Xi and y-i. We assume that Xi lies in C2 in Figure HTT] The 
calculations in (|4.4p in Lemma 14.61 tell us that Zi can be written as 



where < Ai < A2 < 1, |v2| = 1 and a = 6,J^ + ^f-) + o{S) by 

Therefore, Xi — Zi can be written as 



where vi G K" ^ satisfies 




and a = \J — i-%{-a -S) calculated in the proof of Proposition 14.31 This 

means that the unit vector with direction Xi — Zi converges to the n-th elementary 
vector as (5 \ 0. By appealing to Hessians as is done in the proof of Lemma [4.71 
we see that the function / is strictly concave in the line segment [xi , Zi] if i is 
large enough. Similarly, / is strictly concave in the line segment [j/i, Zi] if i is large 
enough. 
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Next, we prove that the function / has only one local maximizer in p{[0, 1]). In 
the case where Vf{zi) = 0, the concavity of / on the line segments [xi^Zi] and 
[yijZi] tells us that Zi is the a unique maximizer on p([0, 1]). We now look at the 
case where V/(zi) ^ 0. Since Zi is the minimizer on a subspace with normal Xi — yi, 
V/(zi) is a (possibly negative) multiple of Xi — yi- This means that V/(zi) • {xi — Zi) 
has a different sign than 'SIf{zi) ■ [yi — Zi). In other words, the map t i— > f{p{t)) 
increases then decreases. This concludes the proof of the proposition. □ 

Remark 5.2. Note that in Algorithm 13.11 all we need in step 1 is a good lower 
bound of the critical value. We can exploit convexity as proved in Lemma 14.71 and 
use cutting plane methods to attain a lower bound for / on Li nM{x,9). 

Recall from Proposition 14 . 51 that Mi is a sequence of upper bounds of the critical 
value f{x). While it is not even clear that Mi is monotonically decreasing, we can 
prove the following convergence result on Mi . 

Proposition 5.3. Suppose that f : R" M. is C'^ in a neighborhood of a nondegen- 
erate critical point x of Morse index 1, the neighborhood U of x and the points xq 
and j/o ff'e chosen satisfying the conditions in the statement of Theorem \4-^\ Then 
in Algorithm ic. 11 Mi := max[^. ^.j / converges R-superlinearly to the critical value. 

Proof. Suppose Assumption 14.11 holds . An upper bound of the critical value of the 
saddle point is obtained by finding the maximum along the line segment joining 
two points in Ci and C2, which is bounded from above by 

{ai+S)a = [ai+o)- j- jr. 

A more detailed analysis by using cylinders with ellipsoidal base instead of circular 
base tell us that the maximum is bounded above by instead. If (5 > is 

small enough, this value is much smaller than — f{xi) = e. As z — )■ c», the estimates 
—f{xi) converge superlinearly to by Theorem 14. 8[ giving us what we need. 

□ 

Step 1(a) is important in the analysis of Algorithm 13. II As explained earlier in 
Section [31 it may be difficult to implement this step. Algorithm 13.11 may run fine 
without ever performing step 1(a) (see the example in Section [5]), but it may need 
to be performed occasionally in a practical implementation. The following result 
tells us that under the assumptions we have made so far, this problem is locally a 
strictly convex problem with a unique solution. 

Proposition 5.4. Suppose that f : R" — > R is in a neighborhood of a nondegen- 
erate critical point x of Morse index 1 with critical value f{x) = c. Then if e > 
is small enough, there is a convex neighborhood Ue of x such that (lev<c_e/) H Ue 
is a union of two disjoint convex sets. 

Consequently, providing 9 is sufficiently small, the pair of nearest points guar- 
anteed by Proposition \4. cl^ 2 ) are unique. 

Proof. Suppose Assumptions |4T] and l4?2] hold. In addition, we further assume that 

\\/f{x)~H{x)\ < S\x\ for all x e B(0,6'). 

We can choose to be the interior of conv(Ci U C2), where Ci and C2 are the 
cylinders in Figure 14.11 and defined in the proof of Proposition I4.3|, but in view of 
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Theorem l5.6| we shall prove that Ue can be chosen to be the bigger set conv(CiUC2), 
where Ci and C2 are cylinders defined by 

Ci := B"-i (0, p) X [-^, -b] C W"-^ X M, 
C2 ■■= V'"-^iO,p) X c R""^ X R, 
where /3, p are constants to be determined. We choose (3 such that 

B"-i(0,a) X {/?} c int(5'+). 

In particular. /? satisfies 

^/32 > ^(e + a2(a, +^)) 

e , 85(ai + ,5) 



-a„ - 5 \ (a„_i - (5)(-a„ - (5) 

We choose /3 to be any value satisfying the above inequality. 

Next, we choose p to be the smallest value such that S- O (R"~^ x [—f3,f3])n 
') C Ci U (72. This calculation is similar to the calculation of a, which gives 

(a„-i - 




We shall not expand the terms, but remark that /3 and p are of 0{y/e). 

The proof of Proposition 14.31 tells us that conv(C'i U C2) H lev<_e/ is a union of 
the two nonempty sets Ci H lev<-(f and C2 H lev<_e/. It remains to show that 
these two sets are strictly convex. 

Any point x G Ci can be written as 

X = (x , X^i), 

where x' G M"^^ is of norm at most p, and — /3 < .t„ < —6, where /? is as calculated 
above and b = ^ -a i^i Figure 14.11 This implies that 

Hx — (x ,fl^x^), 

where x" is of norm at most ai |x'|. It is clear that as 5 4, 0, the unit vector in the 
direction of Hx converges to (0, 1). This implies that for any ki > 0, there exists 
some S > such that unit(V/(x)) • (0, 1) > 1 - ki for ah x £ Ci. (Note that Ci 
depends on 6.) Here, unit : R"\ {0} — )- R" is the mapping of a nonzero vector to 
the unit vector pointing in the same direction. 

Let zi and Z2 be points in Ci fl (lev<_(:/). Suppose that zi{n) < Z2{n), and let 
V = {vi,V2) G R"~^ X R be a unit vector in the same direction as 22 — 2:1. We 
further assume, by reducing 9 and S as necessary, that ||i?(x) — -ff(0)|| < K2 for all 

X g C'in(lev<_e/). Suppose ki and K2 are small enough so that \J2k\ < \J a" l-'^ - 
Note that V2 > 0. Either one of these two cases on V2 must hold. We prove that 

in both cases, he open line segment (21, 22) lies in the interior of (lev<_e/) n Ci. 
Case 1: V2 > \/2ki. 
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In this case, for all a; G Ci, we have 

V • (unit(V/(a;))) = v (0, 1) + v (unit(V/(x-)) - (0, 1)) 

> «2- |v||unit(V/(a;))- (0,1)1 
= «2-|unit(V/(x))- (0,1)1 



«2 - Vl™it(V/(x))|' + 1(0, 1)1" - 2unit(V/(x)) • (0, 1) 



> V2- V2-2(1-Ki) 

> 0. 

This means that along the line segment [21, Z2], the function / is strictly monotone. 
Therefore, if xi,X2 G (lev<_e/) H Ci, the open line segment (21,22) lies in the 
interior of (lev<_c/) n Ci. 

Case 2: V2 < 



(In — 1 ^0,n 

Let i7"(0) denote the diagonal matrix of size (n — 1) x (n — 1) with elements 
ai, . . . , a„_i. We have 

v^i?(a-)v = v^H(0)v + v'^(F(3;)-F(0))v 

> v7F"(0)vi + ar^vi - |v|' \\H{x) - H{0)\\ 

> an^i\v2f +anvl-\\H{x)- H{0)\\ 

> an-l{l - vl) + anV2 - >i2 

= a„-i + vl{a„ - a„_i) - K2 

> fln-l + (k2 - fln-l) - ^2 

> 

This means that the function / is strictly convex along the line segment [21 , 22] , 
so if a;i,a;2 € (lev<_e/) H Ci, the open line segment (21,22) lies in the interior of 
(lev<_e/) n Ci, concluding the proof of the first part of this result. 

To prove the next statement on the uniqueness of the pair of closest points, 
suppose that {x\y') and (x",y") are distinct pairs whose distance give the dis- 
tance between the components of (lev<_c/) n 18(0,6'), where M{0,9) is as stated 
in Proposition 14.31 If e is small enough, then conv(C'i U C2) lies in M{0,9). Then 
by the strict convexity of the components of (lev<_e/) n conv(C'i U C2), the pair 
(^(x' + x"), ^{y' + y")) lie in the same components, and the distance between this 
pair of points must be the same as that for the pairs (i', y') and (i", y"). The closest 
points in the components of \^{x' -\-x"), \{y' -\-y")\C\\ev<,^^f give a smaller distance 
between the components of (lev<_£/) nB(0, 0), which contradicts the optimality of 
the pairs {x',y') and {x",y"). □ 

Note that in the case of e = 0, there may be no neighborhood Uq of x such that 
Uo n (lev<c/) is a union of two convex sets intersecting only at the critical point. 
We also note that depends on e in our result above. The following example 
explains these restrictions. 

Example 5.5. Consider the function / ; — > M defined by f{x) = {x2 — a;i)(a;i — 
a;|). The shaded area in Figure [Ql is a sketch of lev<o/. 
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Figure 5.1. lev<o/ for f{x) = {x2 - xl){xi - xl) 

We now explain that the neighborhood defined in Proposition l5.4l must depend 
on e for this example. For any open U containing 0, we can always find two points 
p and g in a component of (lev<o/) H U such that the line segment [p, q] does not 
lie in lev^o/. This implies that the component of (lev<_£/) nU is not convex if 
< e < -max(/(p),/(g)). o 

We now take a second look at the problem of minimizing the distance between 
two components in step 1(a) of Algorithm 13.11 We need to solve the following 
problem for e > 0: 

min^,^y \x - y\ 

(5.1) s.t. X lies in the same component as a in (lev<y(2)_£/) n M{x, 9) 

y lies in the same component as b in (lev<j(j)_j/) r\M{x^9). 

If (x, y) is a pair of local optimizers, then y is the closest point to the component of 
(lev< f(x)-ef)(^U containing x and vice versa. This gives us the following optimality 
conditions: 

V/(x) = Ki{y - x), 
V/(y) = K2{x - y), 

(5.2) /(i) = fix) - e 

fiy) = fix) ~ 6 

for some ki, K2 > 0. 

From Proposition 15. 4| we see that given any 6 > sufficiently small, provided 
that the conditions in Proposition 14.31 hold, the global minimizing pair of (|5.ip 
is unique. Even though convexity is absent, the following theorem shows that 
the global minimizing pair is, under added conditions, the only pair satisfying the 
optimality conditions (|5.2p . showing that there are no other local minimizers of 

m- 

Theorem 5.6. Suppose that f : R" — > R is , and x is a nondegenerate critical 
point of Morse index 1 such that fix) = c. If 6 > is sufficiently small, then for 
any e > (depending on 9) sufficiently small, the global minimizer of (|5.1[) is the 
only pair in Mix, 9) x Mix, 9) satisfying the optimality conditions (j5.2p . 
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Proof. Suppose that Assumption 14.11 holds, and S is chosen small enough so that 
14.21 holds. We also assume that 9 is small enough so that \H{x) — H{0)\ < 
i min(a„_i, — a„). Seeking a contradiction, suppose that {x,y) satisfy the opti- 
niality conditions. 

We refer to Figure 14.11 and also recall the definitions of the sets Ci and C2 in 
the proof of Proposition 15. 41 As proven in Proposition l5.4[ the convexity properties 
of the two level sets in (lev<j(g)_g/) n M{x,6) imply that if i S Ci, y e C2 and 
the optimality conditions are satisfied, then the pair (i, y) is the global minimizing 
pair. 

Consider the case where x ^ Ci. Either of the two cases hold. We note the 
asymmetry below in that we check whether y G C2 instead of whether y £ €2- 

Case 1: y G C2' In this case, if the first n — 1 coordinates of x are the same as 
that of y, then x lies in the interior of (lev<^c/)nIB(x, 6), which is a contradiction to 
optimality. Recall that the value of /? was chosen such that y + (0, x{n) — y{n)) lies 
in (lev<_e/) C]M{x,9). By the convexity of f\L'(x{n))i where L'{x{n)) is the affine 
space {x I x{n) = x{n)}, the line segment connecting x and y + (0, x(n) — y{n)) lies 
in (lev<_e/) n M{x,d). The distance between y and points along this line segment 
decreases (at a linear rate) as one moves away from x, which again contradicts the 
assumption that {x,y) satisfy (|5.2D . 

Case 2: y ^ C2: By the convexity of /|L'(i(?i)) s^nd ,f\L'{y(n))j the line segments 
[y,y — {0,y{n))] and [x,x— {0,x{n))] lie in (lev<_e/)nB(a;,6'). These line segments 
and the optimality of the pair (a;, y) implies that the first n ~ 1 components of x 
and y to be the same. This in turn implies that V/(a;) is a positive multiple of 
(0,1). 

Our proof ends if we show that if 9 is small enough, V/(.t) cannot be a positive 
multiple of (0, 1). If £ ^ Ci, then i(n) < — /3. If x lies on the boundary of lev<_g/, 
then f{x) = — e, and we have 

/(i) 

n 

^(a, +(5)i(i)^ 

1=1 

71 

(ai + 6) ^ i(i)^ + (a„ - ai)x{n)^ 
(«i+<5)|ip 



i(n)^ 



Upon expansion of the term in the expression in the final line, we see that J^jp- 
is bounded from below by a constant independent of e and greater than 1. Since / 
is C^, the set 

{x I V/(a;) is a multiple of (0, 1)} n B(0, 9) 

is a manifold, whose tangent at the origin is the line spanned by (0,1). This 
implies that if 9 is small enough, then x ^ Ci and x lying on the boundary of 



> 



> -e 



> (ai - a„)x{n)^ 



> 



i()i)2 



> 1 



ai + 5 
-a „ - (5 - 

ai + S 
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lev<_e/ implies that V/(i) cannot be a multiple of (0, 1). We have the required 
contradiction. □ 

Remark 5.7. We now describe a heuristic to approximate a pair of closest points 
iteratively between the components of (lev<c-e/) H U. For two points x' and y' 
that approximate Xi and jji , we can find local minimizers of / on the afhne spaces 
orthogonal to x' — y' that pass through x' and y' respectively, say a;* , y* , and then 
find the closest points in the two components of (lev<c-e/) H [x*,y*], where [x* ,y*] 
is the line segment connecting x* and y*. This heuristic is particularly practical in 
the case of Wilkinson problem, as we illuminate in Sections [7] and |51 



6. Saddle points and criticality properties 

We have seen that Algorithm 12.11 allows us to find saddle points of mountain 
type. In this section, we first prove an equivalent definition of a saddle point based 
on paths connecting two points. Then we prove that saddle points are critical points 
in the metric sense and in the nonsmooth sense. 

In the following equivalent condition for saddle points, we say that a path p : 
[0, 1] — > X connects a and b if p(0) = a and p{l) = b, and it is contained in U G X 
if p([0, 1]) C [/. The maximum value of the path p is defined as maxj / o p{t). 

Proposition 6.1. Let {X,d) be a metric space. For a continuous function f : 
X X is a saddle point of mountain pass type if and only if there exists an 

open neighborhood U and two points a,b € (lev<i/) H U such that 

(a) The maximum value of any path connecting a and b contained in U is at 
least f{x), and 

(b) for all e > 0, there exists 6,0 € (0, e) and a path p^ connecting a and b 
contained in U such that the maximum value of Pe is at most f{x) + 1, and 
(lev>y(x)-e/)np,([0,l]) cB(x,(5). 

Proof. We first prove that the conditions (a) and (b) above imply that x is a saddle 
point. Let A and B be the path connected components of lev<y(2)/n U containing 
a and b respectively. For any e > 0, the condition (lev>j(g)_e/)npe([0, 1]) C B(x, 5) 
tells us that we can find points x^ G A and y^ G B such that d{x,Xe) < 6 < e and 
d{x,ye) < e. For a sequence ej \ 0, we set Xi = x^. and yi = j/e^. This shows that 
X lies in both the closure of A and that of B, and hence x is a saddle point. 

Next, we prove the converse. Suppose that x is a saddle point, with U being a 
neighborhood of x, and the sets A and B are two path components of (lev<^(2)/)n?7 
whose closures contain x. For any e > 0, we can find some 6 £ (0, e) such that 
d{x, x) < S implies \ f{x) — f{x)\ < e. There are two points x^ € A and ye & B such 
that d{xe,x) < 5 and d{ye,x) < 6. 

Let a and b be any two points in the sets A and B respectively. There is a path 
connecting a to Xe contained in lev<^(2')/ H U, say p^, and we can similarly find 
a path ph connecting y^ to b contained in lev<;y(g)/ n U. The maximum values on 
both paths pa and pb are less than /(x), so there is some 9 G (0, e) such that both 
maximum values are bounded above by /(x) — 9. Choose a path p[ to be the line 
segment connecting Xa and yt contained in M{x,S). The path p^ formed by the 
concatenation of the paths pa, p'^ and pb satisfies condition (b). Condition (a) is 
easily seen to be satisfied, and hence we are done. □ 
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Ideally, we want to improve condition (b) in Proposition 16.11 so that x is the 
maximum point on some mountain pass connecting a and b. We shall see in Example 
16.31 that saddle points in general need not have this property. A simple finite 
dimensional condition on the function / so that this happens is semi-algebraicity. 
A set in K" is semi-algebraic if it is a union of finitely many sets defined by finitely 
many polynomial inequalities, and a function / : M" — > M is semi- algebraic if 
its graph {{x,y) G M" x M | y = f{^)} is a semi-algebraic set. Semi-algebraic 
objects remove much of the oscillatory behavior that typically does not appear in 
applications, and form a large class of objects that appear in applications. We will 
appeal to semi-algebraic geometry for only the next result, and we refer readers 
interested in the general theory of semi-algebraic functions (and more generally, 
that of o-minimal structures and tame topology, under which Proposition 16.21 also 
holds) to [aiUlITSlEQl. 

Proposition 6.2. In the case where / : R" — >■ M is semi- algebraic, condition (b) 
in Proposition 16.71 can be replaced with 

(b') There is a path connecting a and b contained in U along which the unique 
maximizer is x. 

Proof. It is clear that (b') is a stronger condition than (b), so we prove that if / 
is semi-algebraic, then (b') holds. Suppose a; is a saddle point of mountain pass 
type. Let U be an open neighborhood of x, and sets A and B be two components of 
(lev<j(g)/)nC/ whose closures contain x. Choose points a € A and 6 G B. It is clear 
that A and B are semi-algebraic (see for example [151 Section 3.2]. By the curve 
selection lemma (see for example [El Section 3.1]), there is a path pa connecting 
a and x such that Pa(l) = x, and pa([0, 1)) C A. Similarly, we can find a path pb 
connecting x and b such that Pb(0) = x and Pb((0, 1]) C B. The concatenation of 
Pa and pb gives us what we need. □ 

In the absence of semi-algebraicity, the following example illustrates that a saddle 
point need not satisfy condition (b'). 

Example 6.3. We define / : K through Figure [O] There are 2 shapes in 

the positive quadrant the figure: a blue "comb" C wrapping around a brown "sun" 
S. The closure of C contains the origin (the intersection of the horizontal and 
vertical axis). 

We can define a continuous / : — > M so that / is negative on C U (— C) and 
positive on [S U (— S'))\{0} and {(x, y) | xy < 0}, and extend / continuously to 
all of using the Tietze extension theorem. It is clear that is a saddle point, 
and the sets A, B C lev<o/ whose closures contain can be taken to be the path 
connected components containing C and (— C) respectively. But the origin does 
not satisfy condition (b'). 

Our next step is to establish the relation between saddle points and criticality 
in metric spaces. We recall the following definitions in metric critical point theory 
from [HIMllli]. 

Definition 6.4. Let {X, d) be a metric space. We call the point x Morse regular 
for the function / : A" — > M if, for some numbers 7, cr > 0, there is a continuous 
function 

(j) : B(x,7) X [0,7] ^ X 



LEVEL SET METHODS FOR FINDING CRITICAL POINTS OF MOUNTAIN PASS TYPE 24 




Figure 6.1. Illustration of saddle point in Example 16.31 



such that all points u G B(a:;,7) and t e [0,7] satisfy the inequality 

f{c^{x,t))<f{x)~<jt, 

and that 0) is the identity map. The point x is Morse critical if it is not Morse 
regular. 

If there is some k > and such a function (p that also satisfies the inequality 

d{(l){x, t), x) < Kt, 

then we call x deformationally regular. The point x is deformationally critical if it 
is not deformationally regular. 

We now relate saddle points to Morse critical and deformationally critical points. 

Proposition 6.5. For a function / : X — > M defined on a metric space X , x is a 
saddle point of mountain pass type implies that x is deformationally critical. If in 
addition, either X ~ R" or condition (h' ) in Proposition 1 6. 2\ holds, then x is Morse 
critical. 

Proof. Let U be an open neighborhood of x as defined in Definition II. 2[ and let A 
and B be two distinct components of (lev<^(g)/) H U which contain x in their clo- 
sures. The proofs of all three results by contradiction are similar. For convenience, 
we label the following three assumptions as follows, and prove that they all lead to 
the contradiction that A and B cannot be distinct path components in U . 

(£)) X is deformationally regular. 
(MRn) X is Morse regular, and X = R". 

(Mf,') X is Morse regular, and condition (b') in Proposition 16.21 holds. 



LEVEL SET METHODS FOR FINDING CRITICAL POINTS OF MOUNTAIN PASS TYPE 25 

Suppose condition (Mk") holds. Let 7,0- > and (f) : B(a;,7) x [0,7] — > X satisfy 
the properties of Morse regularity given in Definition 16.41 We can assume that 7 is 
small enough so that B(x,7) C U. By the continuity of (j) and the compactness of 
B(x,7), there is some 7' > such that B(x,7) x [0,7'] C </'"^(t/). 

Next, suppose condition [D] holds. Let 7,(7, k > and ((> : B(a;,7) x [0,7] — !• X 
satisfy the properties given in Definition 16.41 on deformation regularity. We can 
assume 7 > is small enough and choose 7' > so that B(x,7 + 7'^) C U . The 
conditions on imply that (/)(B(x,7) x [0,7']) C B(a;,7 + 7'K) C U, which in turn 
imply that B(x,7) x [0,7'] C ^"^(J/). 

Here is the next argument common to both conditions [D) and [M^n). By the 
characterization of saddle points in Proposition 16.11 we can find 9 and 6 satisfying 
the condition in Proposition l6.ir b) with 0,6 < min(i7'cr, 7). This gives us B(x, S) C 
B(x,7) C ?7 in particular. We can glean from the proof of Proposition 16. II that we 
can find two points as G AO M{x, 5) and bs G B O M{x, 6) and a path p' : [0, 1] X 
connecting as and bs contained in M(x,6) with maximum value at most f{x) + 
min(i7'tT, 7). The functions values f{as) and f{bs) satisfy f{as), f{bs) < f[x) — 9. 
The condition B(i,7) x [0,7'] C (j>~'^{U) implies that p'{[Q, 1]) x [0,7'] C (t>~^{U). 

If condition (Affc/) holds, then for any (5 > 0, we can find a path p' : [0, 1] X 
connecting two points as £ Ar\M{x, S) and bs G Br\M{x, S) contained in B(a;, 5) with 
maximum value at most f{x). There is also some 9 > such that f{as),f{bs) < 
f{x) — 9. Let 7, (T > and : B(a;,7) x [0,7] X he such that they satisfy the 
properties of Morse regularity. By the compactness of p'([0, 1]), we can find some 
7' > such thatp'([0,l]) X [0,7'] C <i)-^{U). 

To conclude the proof for all three cases, consider the path p : [0, 3] — > X defined 

by 

{(t){as,it) forO<t<l 
(j){p'{t-l),i) forl<t<2 
(i){bs,i{?,-t)) for2<t<3. 

This path connects as and bs, is contained in U and has maximum value at most 
max(/(a;) — 9,f{x) — 57'cr), which is less than f{x). This implies that A and B 
cannot be distinct path connected components of (lev<y(j)/) n ?7, which establishes 
the contradiction in all three cases. □ 

We now move on to discuss how saddle points and deformationally critical points 
relate to nonsmooth critical points. Here is the definition of Clarke critical points. 

Definition 6.6. [Ml Section 2.1] Let X be a Banach space. Suppose / : X — > M 
is locally Lipschitz. The Clarke generalized directional derivative of / at a; in the 
direction v € X is defined by 

^ f{y + tv)-f{y) 

f {x;v) = hmsup , 

t\0,y^x t 

where y € X and t is a positive scalar. The Clarke subdifferential of / at x, denoted 
by dcf{x), is the convex subset of the dual space X* given by 

{C&X* \ f°{x; v) > (C, v) for aU v e X}. 

The point a; is a Clarke (nonsmooth) critical point if e dcf{x). Here, (•,•) : 
X* X X — >• R defined by (C,u) ■— C(w) is the dual relation. 
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Figure 6.2. Different types of critical points 



For the particular case of functions, dcf{x) = {V/(x)}. Therefore a critical 
point of a smooth function (i.e., a point x that satisfies V/(a;) = 0) is also a Clarke 
critical point. From the definitions above, it is clear that an equivalent definition 
of a Clarke critical point is f°{x; v) > for all v £ X. This property allows us to 
deduce Clarke criticality without appealing to the dual space X*. 

Clarke (nonsmooth) critical points of / are of interest in, for example, partial 
differential equations with discontinuous nonlinearities. Critical point existence 
theorems for nonsmooth functions first appeared in [T^l |3Z]- For the problem of 
finding nonsmooth critical points numerically, we are only aware of |44| . 

The following result is well-known, and we include its proof for completeness. 

Proposition 6.7. Let X be a Banach space and / : X — >■ M he locally Lipschitz at 
X. If X is deformationally critical, then it is Clarke critical. 

Proof. We prove the contrapositive instead. If the point x is not Clarke critical, 
there exists a unit vector v €z X such that 



Now defining (j){x, t) = .t — tv satisfies the conditions for deformation regularity. □ 

To conclude. Figure [6T2] summarizes the relationship between saddle points and 
the different types of critical points. 



In Section [5J we will apply Algorithm 13.11 to attempt to solve the Wilkinson 
problem, while we give a background of the Wilkinson problem in this section. We 
first define the Wilkinson problem. 

Definition 7.1. Given a matrix A E M"^", the Wilkinson distance of the matrix 
A is the distance of the matrix A to the nearest matrix with repeated eigenvalues. 
The problem of finding the Wilkinson distance is the Wilkinson problem. 

Though not cited explicitly, as noted by [T], the Wilkinson problem can be traced 
back to |4H pp. 90-93]. See [51 [101 HH] for more references, and in particular, [5] 
and the discussion in the beginning of [TUl Section 3]. 



lim sup 



f{y + tv)-f{y) 



< 0. 



t 



7. Wilkinson's problem: Background 
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It is well-known that eigenvalues vary in a Lipschitz manner if and only if they 
do not coincide. In fact, eigenvalues are differentiable in the entries of the ma- 
trix when they are distinct. Hence, as discussed by Demmel |18| . the Wilkinson 
distance is a natural condition measure for accurate eigenvalue computation. The 
Wilkinson distance is also important because of its connections with the stability of 
eigendecompositions of matrices. To our knowledge, no fast and reliable numerical 
method for computing the Wilkinson distance is known. 

The e-pseudospectrum Ag(A) C C of yl is defined as the set 

A^{A) := {z I 3E s.t. \\E\\ < e and z is an eigenvalue oi A + E} 
= {z\\iA-zir'\-' <e} 
= {z\a{A~zI)<e}, 

where q_{A—zI) is the smallest singular value of A—zI. The function z i—> (A—zI)^^ 
is sometimes referred to as the resolvent function, whose (Clarke) critical points are 
referred to as resolvent critical points. To simplify notation, define ct^ : C — > M+ 

by 

= smallest singular value of {A — zl). 

For more on pseudospectra, we refer the reader to |40j . 

It is well known that each component of the e-pseudospectrum (A) contains at 
least one eigenvalue. If e is small enough. A,: (A) has n components, each containing 
an eigenvalue. Alani and Bora [T] proved the following result on the Wilkinson 
distance. 

Theorem 7.2. |lj Let e be the smallest e for which A^{A) contains n — 1 or fewer 
components. Then e is the Wilkinson distance for A. 

For any pair of distinct eigenvalues of A, say {21,22} , let the objective of the 
mountain pass problem with function g_j^ and the two chosen eigenvalues as end- 
points be v{zi,Z2). The value e is also equal to 

(7.1) min{f(zi,Z2) | Zi and Z2 are distinct eigenvalues of A}. 

Two components of Ae(j4) would coalesce when e f Cj S'Hd the point at which two 
components coalesce can be used to construct the matrix closest to A with repeated 
eigenvalues. Equivalently, the point of coalescence of the two components is also 
the highest point on an optimal mountain pass for the function between the 
corresponding eigenvalues. We use Algorithm [3Tl] to find such points of coalescence, 
which are resolvent critical points. 

We should remark that solving for v{zi,Z2) is equivalent to solving a global 
mountain pass problem, which is difficult. Also, the problem of finding the eigen- 
value pair {21,22} that minimizes (|7.ip is potentially difficult. In Section [51 we 
focus only on finding a critical point of mountain pass type between two chosen 
eigenvalues 21 and 22. Fortunately, this strategy often succeeds in obtaining the 
Wilkinson distance in our experiments in Section [5] 

We should note that other approaches for the Wilkinson problem include [5], 
which uses a Newton type method for the same local problem, and [30) . 
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8. Wilkinson's problem: Implementation and numerical results 

We first use a convenient fast heuristic to estimate which pseudospectral compo- 
nents first coalesce as e increases from zero, as follows. We construct the Voronoi di- 
agram corresponding to the spectrum, and then minimize the function g_y^ : C — > M 
over all the line segments in the diagram (a fast computation, as discussed in the 
comments on Step I (b) below) . We then concentrate on the pair of eigenvalues sep- 
arated by the line segment containing the minimizer. This is illustrated in Example 
O below. 

We describe implementation issues of Algorithm 13. II 

Step 1(a): Approximately minimizing the distance between a pair of points in 
distinct components seem challenging in practice, as we discussed briefly in Section 
[21 In the case of pseudospectral components, we have the advantage that com- 
puting the intersection between any circle and the pseudospectral boundary is an 
easy eigenvalue computation |31| . This observation can be used to to check opti- 
mality conditions or algorithm design for step 1 (a) . We note that in our numerical 
implementation, step 1(a) is never actually performed. 

Step 1(b): Finding the global minimizer in step 1(b) of Algorithm 13.11 is easy 
in this case. Byers |11J proved that e is a singular value oi A — (x + iy)I if and only 
if iy is an eigenvalue of 

' x-A* -el 
el A- X 

Using Byer's observation, Boyd and Balakrishnan [H] devised a globally convergent 
and locally quadratic convergent method for the minimization problem over M of 
y ^LaI^ ~^ ■ We can easily amend these observations to calculate the minimum 
of g_j^{x + iy) over a line segment efficiently by noticing that if \z\ = 1, then 

a^{x + iy) = a{A - {x + iy)I) = q_{z{A - {x + iy)I)). 

Example 8.1. We apply our mountain pass algorithm on the matrix 

/ .46i + .650i me + mbi 

.457-H.983i .297H-.733i 
A = .451 + .553i .049 + .376i 

.412 + .400i .693-f.OIOi 

\ .902-f-.199i / 

The results of the numerical algorithm are presented in Table [TJ and plots using 
EigTooL [33] are presented in Figure 18.11 We tried many random examples of 
bidiagonal matrices taking entries in the square {a: -|- ij/ | < x, j/ < 1} of the same 
form as A. The convergence to a critical point in this example is representative of 
the typical behavior we encountered. 

In Figure [8T| the top left picture shows that the first step in the Voronoi diagram 
method identifies the pseudospectral components corresponding to the eigenvalues 
0.461 -I- 0.650i and 0.451 -I- 0.553i as the ones that possibly coalesce first. We zoom 
into these eigenvalues in the top right picture. In the bottom left diagram, succes- 
sive steps in the bisection method gives better approximation of the saddle point. 
Finally in the bottom right picture, we see that the saddle point was calculated at 
an accuracy at which the level sets of ct^ are hard to compute. 

There are other cases where the heuristic method fails to find the correct pair of 
eigenvalues whose components first coalesce. 
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i 








\xi - yi\ 


1 


6.1325135002707E-4 


6.1511092864335E-4 


3.03E-03 


5.23E-03 


2 


6.1511091521293E-4 


6.1511092861426E-4 


2.18E-08 


1.40E-05 


3 


6.1511092861422E-4 


6.1511092861423E-4 


3.35E-15 


9.97E-10 



Table 1. Convergence data for Example 18.11 Significant digits 
are in bold. 



Example 8.2. Consider the matrix A generated by the following Matlab code: 
A=zeros(10) ; 

A(l:9,2:10)= diag( [0.5330 + 0.5330i, 0.9370 + 0.11901,... 
0.7410 + 0.83401, 0.7480 + 0.88701, 0.6880 + 0.67001,... 
0.2510 + 0.74301, 0.9540 + 0.65901, 0.2680 + 0.66101,... 
0.2670 + 0.43401]) ; 

A= A+dlag([0.9850 + 0.75501,0.8030+ 0.78101,... 

0.2590 + 0.51101,0.3840 + 0.53101,0.0080 + 0.53601,... 
0.9780 + 0.27201,0.7190 + 0.31001,0.5560 + 0.83701,... 
0.6350 + 0.76301,0.5110 + 0.88701]); 

A sample run for this matrix is shown in Figure 18.21 The heuristic on minimal 
values of a;^ on the edges of the Voronoi diagram identifies the top left and cen- 
tral eigenvalues as a pair for which the pseudospectral components first coalesce. 
However, the correct pair should be the central and bottom right eigenvalues. 

Here are a few more observations. In our trials, we attempt to find the Wilkinson 
distance for bidiagonal matrices of size 10 x 10 similar to the matrices in Examples 
18.11 and 18.21 In all the examples we have tried, there was no need to perform step 
1(a) of Algorithm l3.1l to achieve convergence to a critical point. The convergence for 
the matrix in Example 18 . 1 1 reflects the general performance of the (local) algorithm. 
As we have seen in Example 18. 2[ the heuristic for choosing a pair of eigenvalues 
may fail to choose the correct pseudospectral components which first coalesce as e 
increases. In a sample of 225 runs, we need to check other pairs of eigenvalues 7 
times. In such different choice of a pair of eigenvalues still gave convergence 

to the Wilkinson distance, though whether this must always be the case is uncertain. 
The upper bounds for the critical value are also better approximates of the critical 
values than the lower bounds. 

9. NON-LlPSCHITZ CONVERGENCE AND OPTIMALITY CONDITIONS 

In this section, we discuss the convergence of Algorithm 12 . 1 1 in the non-Lipschitz 
case and give an optimality condition in step 2 of Algorithm 12.11 As one might 
expect in the smooth case in a Hilbert space, if Xi and yt are closest points in the 
different components, V f{xi) ^ and V f{yi) ^ 0, then we have 

Xi-yi = XiWfiyi), 
y^- Xi = A2V/(a;i). 
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Figure 8.1. A sample run of Algorithm [3T 
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Figure 8.2. An example where the Voronoi diagram heuristic fails. 
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for Ai, A2 > 0. The rest of this section extends this result to the nonsmooth case, 
making use of the language of variational analysis in the style of [36t [H [Ml [32] to 
describe the relation between subdifferentials of / and the normal cones of the level 
sets of /. 

We now recall the definition of the Frechet subdifFerential, which is a generaliza- 
tion of the derivative to nonsmooth cases, and the Frechet normal cone. A function 
/: X R is Zsc (lower semicontinuous) if liminfa;-^^ fi^) ^ fi^) for Bill x € X. 

Definition 9.1. Let f : X ^ RU {+00} be a proper Isc function. We say that / 
is Frechet subdifferentiable and x* is a Frechet-subderivative of / at a; if a:; G dom/ 
and 

liminf^fc^^l-M^i:^:^>0. 

\h\^0 \h\ 

We denote the set of all Frechet-subderivatives of / at a; by dpfix) and call this 
object the Frechet subdifferential of / at x. 

Definition 9.2. Let 5 be a closed subset of X . We define the Frechet normal cone 
of 5 at a; to be Np{S;x) := dpi^six)- Here, 15 : X — > R U {00} is the indicator 
function defined by ls{x) = if a; G 5, and 00 otherwise. 

Closely related to the Frechet normal cone is the proximal normal cone. 

Definition 9.3. Let X be a Hilbert space and let 5' C X be a closed set. If a; ^ S* 
and s G 5 are such that s is a closest point to x in S, then any nonnegative multiple 
of .T — s is a proximal normal vector to S at s. The set of all proximal normal vectors 
is denoted Np{S; s). 

The proximal normal cone and the Frechet normal cone satisfy the following 
relation. See for example (SJ Exercise 5.3.5]. 

Theorem 9.4. Np{S;x) C Nf{S;x). 

Here is an easy consequence of the definitions. 

Proposition 9.5. Let Si be the component of\ev<iif containing xq and S2 be the 

component o/lev<;./ containing yo- Suppose that Xi is a point in Si closest to S2 
and Hi is a point in S2 closest to Xi . Then we have 

iVi - Xi) £ Np{\ev<ij;xi) C A^F(lev<;,/; .t,;). 

Similarly, (xi — yi) G A^i?(lev<;;/; t/,). These are two normals o/lev<i./ pointing 
in opposite directions. 

The above result gives a necessary condition for the optimality of step 2 in 
Algorithm l2.1l We now see how the Frechet normals relate to the subdifferential of 
/ at Xi, yi at z. Here is the definition of the Clarke subdifferential for non-Lipschitz 
functions. 

Definition 9.6. Let X he a. Hilbert space and let / : X M be a Isc function. 
Then the Clarke subdifferential of / at a; is 

dcf{x) :=clconv{w-lima;* | x* G dp f{xi),{x^, f{x,)) (x, /(x))} + 5^/(5), 

where the singular subdifferential of / at a; is a cone defined by 
dcf{x) := clconv{w- limAiX* | x* G dpf{xi),{x^,f{xi)) -> {xj{x)),\, 0+}. 
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For finite dimensional spaces, the weak topology is equivalent to the norm topol- 
ogy, so we may replace w — lim by lim in that setting. We will use the limiting 
subdifferential and the limiting normal cone, whose definitions we recall below, in 
the proof of the finite dimensional case of Theorem 19.111 

Definition 9.7. Let X be a Hilbert space and let / : X — >■ M be a Isc function. 
Define the limiting subdifferential of / at i by 

dLf{x) ■■= {w - \imx* I X* € dpfixi), {xi,f{xi)) {x,f{x))}, 

and the singular subdifferential of / at x. which is a cone, by 

d°°f{x) := {w-limt,x* I x* € dFf{x^),ix,J{x,)) ^ (x,/(a;)),t, ^ 0+}. 

The limiting normal cone is defined in a similar manner. 

Definition 9.8. Let X be a Hilbert space and let 5* be a closed subset oi X. Define 
the limiting normal cone of S* at a; by 

Nl{S; x) := {w — limx* | x* S Nf(S; Xi), S' 9 — )• x}. 

It is clear from the definitions that the Frechet subdifferential is contained in 
tire limiting subdiflFerential, which is in turn contained in the Clarke subdifferential. 
Similarly, the Frechet normal cone is contained in the limiting normal cone. We first 
state a theorem relating normal cones to subdifferentials in the finite dimensional 
case. 

Theorem 9.9. [Ml Proposition 10.3] For a Isc function / : M" ^ R U {oo}, let x 
he a point with f{x) = a. Then 

^F(lev<„/; x) D R+dpfix) U {0} . 

If dLf{x) ^ 0, then also 

iVL(lev<„/; x) C R+dLf{x) U d^f{x). 

The corresponding result for the infinite dimensional case is presented below. 

Theorem 9.10. [51 Theorem 3.3.4] Let X be a Hilbert space and let f : X ^ 
K U {+00} be a Isc function. Suppose that liminfj;_i.j ci(c?i?/(a;); 0) > and ^ £ 
iVi?(lev<j(j)/; x). Then, for any e > 0, there exist A > 0, (x,f{x)) € M^{{x, f{x))) 
and x* € dpfix) such that 

\Xx*^^\<e. 

With these preliminaries, we now prove our theorem for the convergence of Al- 
gorithm [211] to a Clarke critical point. 

Theorem 9.11. Suppose that / ; X — > M, where X is a Hilbert space and f is Isc. 
If z is such that 

(1) (2, z) is a limit point of {{xi,yi)}°^^ in Alaorithm \2.1\ and 

(2) / is continuous at z. 
Then one of these must hold: 

(a) z is a Clarke critical point, 

(b) O'q f{z) contains a line through the origin, or 

(c) I |^'~^' I converges weakly to zero. 
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Proof. We present both the finite dimensional and infinite dimensional versions of 
the proof to our result. 

Suppose the subsequence {(a^i, yOligj such that \mii^ao.ii£j{xi,yi) = (z, z), 
where J C N. We can choose J so that none of the elements in {{xi,yi)}ifzj are 
such that liminfj;^^^;; c? (i9i?/(a;); 0) = or liminfy^y^ d{dpf{ii)]0) = 0, otherwise 
we have G dcf{z) by the definition of the Clarke subdifferential, which is what 
we seek to prove. (In finite dimensions, the condition lim inf . d{dFf{x); 0) = 
can be replaced by G dLf{xi).) We proceed to apply Theorem l9.10l (and Theorem 
19.91 for finite dimensions) to find out more about Np(lev<ci-f;xi). 

We first prove the result for finite dimensions. If g dLf{z), we are done. 
Otherwise, by Proposition 19.51 and Theorem 19. 9[ there is a positive multiple of 
V = limj_>.oo \y'J^'\ ^^^^ li^s either dLf{z) or d°°f{z). Similarly, there is a positive 

multiple of —v = limj_>.oo i^'Z^'i lying in either dLf(z) or d°° fiz). If either v or —v 

\yi 



lies in dLf{z), then we can conclude G dcf{z) from the definitions. Otherwise 
both V and —v lie in d^f{z), so R{w} C d'^f{z) as needed. 

We now prove the result for infinite dimensions. The point z is the common 
limit of {a^ijjgj and {yi\i^j- By the optimality of \xi — yi\ and Proposition 19.51 
we have ?/,; - Xi £ NF{\ev<iJ; Xi) and xi - yi G 7V_F(lev<;, /; yi). By Theorem |!)J0| 
for any n.i — > 0+, there is a A,; > 0, x- G B:f^.\ri.._y.\{xi) and x* G dpf [x'j) such that 
\\iX* - {yi - x,)\ < Kj \yi - Xi\. Similarly, there is a 7^ > 0, y- G lS,f,.\y._^.i{yi) and 
y G dpfiy'i) such that \^iy* — {xi — yi)\ < Ki\xi ~ yi]. If either a;* or y* converges to 
0, then G dc.f{z), and we are done. Otherwise, by the Banach Aloaglu theorem, 

■pYyX*!' and I \ y -x \ ~ ^*-^}; ^^^^^ wcak cluster 

points. We now show that they must have the same cluster points by showing that 
their difference converges to (in the strong topology). Now. 



the unit ball is compact, so 



XiX* 



< 
< 



XiX* 



m - ■■ 
Kt + 1, 



+ 



and similarly, 1 — < 



so 



\Vi-Xi\ 



1, and thus 



XiX* 



Wi 



0. 



This means that 



Vi 



Wi 



< 



XiX* 



Wi - Xi 



3. 

\x* 



XiX* 



^0, 



which was what we claimed earlier. This implies that -r—^- and 7^ have weak 

cluster points that are the negative of each other. 

We now suppose that conclusion (c) does not hold. If {x*}- has a nonzero 
weak cluster point, say x* , then x* belongs to dcf{z). Then either has a 

weak cluster point y* that is strictly a negative multiple of 5:*, which implies that 
G dcf{z) as claimed, or there is some G d^f{z) which is a negative multiple 
of X* , which also implies that G dcf{z) as needed. 
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If neither {x*}^ or converges weakly, then two (nonzero) weak chister points 
of and that point in opposite directions give a Hne through the origin in 
do" f{z) as needed. □ 

In finite dimensions, conclusion (b) of Theorem 19. Ill is precisely the lack of "epi- 
Lipschitzness" |36[ Exercise 9.42(b)] of /. One example where Algorithm 12. II does 
not converge to a Clarke critical point but to a point with its singular subdifferential 
d'c f{') containing a line through the origin is / : M ^ K defined by f{x) = —yj\x\. 
Algorithm O converges to the point 0, where 9c/(0) = and 9^/(0) = M. We 
do not know of an example where only condition (c) holds. 
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